(German) Language Processing for Lucene

نویسنده

  • Bastian Entrup
چکیده

This paper introduces an open-source Java-package called German Language Processing for Lucene (glp4lucene). Although it was originally developed to work with German texts, it is to a large degree language independent. It aims at facilitating four language processing steps for working with non-English texts and Apache Lucene/Solr: lemmatizing words, weighting terms based on their part-of-speech, adding synonyms and decompounding nouns, without the necessity of a thorough understanding of natural language processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Biomedical term normalization of EHRs with UMLS

This paper presents a novel prototype for biomedical term normalization of electronic health record excerpts with the Unified Medical Language System (UMLS) Metathesaurus. Despite being multilingual and cross-lingual by design, we first focus on processing clinical text in Spanish because there is no existing tool for this language and for this specific purpose. The tool is based on Apache Luce...

متن کامل

Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval

The main objective of our experiments in the domain-specific track at CLEF 2008 is utilizing semantic knowledge from collaborative knowledge bases such as Wikipedia and Wiktionary to improve the effectiveness of information retrieval. While Wikipedia has already been used in IR, the application of Wiktionary in this task is new. We evaluate two retrieval models, i.e. SR-Text and SR-Word, based ...

متن کامل

Realtime Ad Hoc Search in Twitter: Know-Center at TREC Microblog Track 2011

In this paper, we outline our experiments carried out at the TREC Microblog Track 2011. Our system is based on a plain text index extracted from Tweets crawled from twitter.com. This index has been used to retrieve candidate Tweets for the given topics. The resulting Tweets were post-processed and then analyzed using three different approaches: (i) a burst detection approach, (ii) a hashtag ana...

متن کامل

Willingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom

This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...

متن کامل

Willingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom

This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015